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ABSTRACT 


In terms of the methodology used in this project, we will determine the impact of the power of formal 
comparisons between groups, which are based on the final measurement when the intermediate measures are 
included in the analysis via a linear mixed model, which has been included as a module in ST60344] This 
impact of power has been assessed for the various intermediate measurements. 

Suppose the assumptions of independence are violated. In that case, we will have the model containing 
observations from each of the groups, and It would be reasonable to assume freedom within each of these 
subjects. However, this model can be developed further, and when fitted, adjustments are made to account for 
the lack of independence. The methodology is known as Repeated Measures ANOVA/?] 

Even with these adjustments made to the subjects, these Repeated Measures ANOVA models don’t impose 
compound symmetry on the correlation within these subjects. We will know the relationship and the different 
factors that affect power. We will also be understanding the interrelationship between the statistical 
significance, power, and effect size. We will look at the values which determine the power with relation to the 
other variables. We will look at the mean and standard deviation of the sample size with the other variables 
known. The learning process with the hypothesis, sample subjects, the sample distributions, the calculation of 
the hypothesis tests, the confidence intervals, and the effect size determine the power. We will also establish 
relationships with the relational studies with the statistics to analyze the data with the correlational model or 
the experimental designs. 


Keywords: Linear Mixed Models, Repeated Measure Analysis, GLS, REML. 
I. INTRODUCTION 


The Regular p-value calculations in the repeated measures anova are accurate if the distribution of the 
variables has compound symmetry. This means that all the response variables have the same variance and each 
pair of response variables share a common correlationl3] 
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Compound Symmetry is the simplest covariance structure that includes correlated errors within the subject. 
Compound symmetry assumes that the correlation between any pair of values of a given subject is the same 
regardless of how close together or far apart they are in time. There is another class of models which allow 
much greater flexibility in what structure to apply/assume for the correlations and hence the covariance matrix 
within subjects are called as Linear Mixed Models. 
Linear Mixed Models are sometimes called as Multilevel models or Hierarchical models depending upon the 
type of regression models. Linear Mixed Models explain two kinds of effects which are fixed effects and the 
other one is random effects. Fixed Effects is the variation that is explained by the independent variables of 
interest and Random Effects is the variation that is not explained by the independent variable of interest-[2] 
Since Linear Mixed Models includes the mixture of fixed and random effects, it’s called as a mixed model. The 
Random effects gives the structure to the e (Error Term). 

Y=Xb+Zgte 
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Y is a vector of n observations. (note independence is not specified). 

X is the design (n x p) matrix (fixed effects). 

B is a vector of p unknown model parameters. 

y is a vector of q unobservable random effects. 

Z is the design (n x q) matrix (random effects). 

eis a vector of error terms. 

In terms of methodology used in this project, we will determine the impact of the power of formal comparisons 
between groups which are based on the final measurement when the intermediate measurements are included 
in the analysis via a linear mixed model which has been included as a module in ST6034."! This impact of the 
power has been assessed for the various intermediate measurements. 

If the assumptions of the independence are violated, we will have the model containing observations from each 
of the groups and It would be reasonable to assume independence within each of these subjects. However, this 
model can be developed further and when fitted, adjustments are made to account for the lack of independence. 
The methodology is known as Repeated Measures ANOVA.I21 

Even with these adjustments made to the subjects, these Repeated Measures ANOVA models doesn’t impose 
compound symmetry on the correlation within these subjects. 


Il. METHODOLOGY 


Comparison Of Linear Mixed Model In Sas And R 

An orthodontic study was conducted on 27 children, 11 girls and 16 boys, all of whom were age eight at the 
beginning of the course. On each child, the distance from the centre of the pituitary to the pterygomaxillary 
fissure was measured (in mm) every two years through age 14. The study objectives were to determine if the 
distances were larger for boys than for girls and if the rate of change in the outcome differed between boys and 
girls. 

SAS: Orthodont.dat 

R: Orthodont {nlme} 


The Orthodont data frame has 108 rows and four columns of the change in an orthodontic measurement over 
time for several young subjects-U4] 


AIC & BIC from the log-likelihood (LL) - SAS Calculation 

The AIC is defined as the log-likelihood term penalized by the number of model parameters. The larger the 
likelihood, the better the model. The more parameters, the worse the model. 

Because the likelihood is often calculated as negative-two-log-likelihood, this formulation is usually found: AIC 
= -2LL+2d, with -2LL being the negative-two- loglikelihood and d the dimension of the model. Generally, 
smaller numbers of AIC are better than larger numbers.) 

BIC = -2LL+d log (n), Here LL denotes the maximum value of the (possibly restricted) log-likelihood, d the 
dimension of the model, and n the number of observations. 


AIC & BIC from the log-likelihood - R Calculation 


This generic function calculates the Bayesian information criterion, also known as Schwarz's Bayesian criterion 
(SBC), for one or several fitted model objects for which a log-likelihood value can be obtained, according to the 
formula -2log- likelihood+npar log(nobs), where near represents the number of parameters and nobs the 
number of observations in the fitted model.[#] 


This generic function calculates the Akaike information criterion for one or several fitted model objects for 
which a log-likelihood value can be obtained, according to the formula -2log-likelihood+2 npar, where npar 
represents the number of parameters in the fitted model. When comparing fitted objects, the smaller the AIC, 
the better the fit. 

The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent 
two-samples t-test for comparing means in a situation where there are more than two groups. In one-way 
ANOVA, the data is organized into several groups based on one single grouping variable (also called factor 
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variable). This tutorial describes the basic principle of the one-way ANOVA test and provides practical ANOVA 
test examples in R software.l!] 


Repeated Measures Analysis Using Baseline Simulation Study 


Dependent variables are estimated using the baseline simulation models in the absence of the hypothesized 
causal effects. The Multivariate normalization used in this section is commonly used in the sequential 
regression analysis. Baseline simulations are developed to capture the critical patterns in the data which are 
independent of the hypothesized effects. Baseline simulation models compare the ways with the designs which 
are formed using the explanatory models [i.e. Linear Model]. These insights are derived to improve the 
explanatory models. Baseline simulation is carried out in the next section [2] using the modified baseline 
simulation study. 

Baseline simulation modelling has received increasing attention through the social and biological sciences 
experiments. Baseline simulation study raises more methodologies which are general and are related to the 
scientific inquiries. 

In this section, we are having a comparison between an explanatory model [Linear Model with T3 Only] and a 
baseline model[Power estimated using the contrast of T3]. However, the baseline models have been developed 
using the data simulated from the mvrnorml[!2] for Group A(n=20) and Group B(n-20) for 5000 Simulations. 
Initially, the number of simulations was set to 1000 simulations. Still, the character of the baseline models 
changed a bit over time as we have developed more sophistication [2]. As we have migrated towards the 
different fields, we have increased the number of simulations to 5000 simulations. 

Although the explanatory model has a more complex alternative, using the P values generated from the models, 
we can have an estimated power. The Statistical Power of a simulation study is the conditional probability given 
on a dependent effect size where the hypothesis test will reject the null hypothesis correctly. Using this power 
analysis, we can conclude that the effect is there when there is one. The situation underscores the power 
analysis that a particular study can be replicated precisely over and over again 2]. For example, if the Power of 
the course is 65%, Which means that 65% of the replication of the study will appropriately reject the null 
hypothesis. 


To estimate the Power, we can simulate data replications of the study and conduct repeated hypothesis test and 
derive a percentage from the studied phenomena. In our research, we are using 5000 Simulations of Power to 
exhibit statistical independence, which changes randomly over time. We can estimate the Power using 
simulation by replicating the study and performing the hypothesis tests. The Proportion of trials where the null 
hypothesis is rejected is the estimated Power. 

Each of the replications is based on the same set of normalized data(i.e. has the same effect size, same sample 
size, the same level of individual variation and the same level of group variation) [l- 


The Idea of the baseline simulation model is used in a sequence of regression calculations. The series of the 
baseline models include both the control variables and other variables. We are running our 5000 simulations 
on a model which is randomized having Group A and Group B with time points T1, T2, T3. 


Table 1: Baseline Simulation Values for the simulation study 


Group T1 T2 T3 
A (n= 20) N(t=100, o=5) N(103, 5) N(102, 5) 
B (n= 20) N(100, 5) N(105, 5) N(110,5) 


The baseline models' simulations in these regressions differ in significant ways of having 0.25.0.5.0.75 
autoregressive structures. The appropriate repeated measures analysis was studied, and the first power 
calculation of the sequence is given in the results. The simulated data is assumed to have complete 
independence. REML in R is used for the repeated measures, and the baseline model's Power was calculated for 
1000, 2000, 3000 simulations. 

The Gls method used in this Restricted Maximum Likelihood estimator. The REML Method gathers estimators 
which are not obtained from the whole likelihood function but are the part of the which is associated with the 
fixed effects of the linear model. In other words, if y = Xb + Zu + e, where Xb is the Analyzing Linear Models With 
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Proc MIXED fixed effects part, Zu is the random effects part and e is the error term, then the REML estimates 
are obtained by maximizing the likelihood function of K'y, where K is a full rank matrix with columns 
orthogonal to the columns of the X matrix, that is, K'X = 0. It leads to REML estimator of the variance- 
covariance matrix of y, say V. It does not depend on the choice of matrix K. 1 

REML estimators are unbiased, and they do not have to be equal to those obtained from PROC GLM. Power is 
the frequency finding a significant result when there is an actual effect, as explained above. The Idea of the 
power simulation is to simulate the study lots of times and count the number of times the result is significant. 
The Power simulation can be varied using a different number of parameters, different number of assumptions 
and calculate the Power accordingly. The only downside of the power analysis is flexible and can be challenging 
to set up the right comparisons. We have come up with a series of assumptions and comparisons which we will 
see in the next section. 


III. MODELING AND ANALYSIS 
DIFFERENCE IN NUMBERS OF REPEATED MEASUREMENTS ANALYSIS 
Repeated measures refer to the observational studies or the experimental designs where the subject is 
measured at several points with a change in time. This is also called as Longitudinal data. During this simulation 
study, we have already gone through Subjects A and B having 20 data points and Three-time intervals. Now we 
are extending the survey to Five-time intervals and Eight-time intervals. 
As the Simulation Study, we use mvrnorm[4]. We have a typical design where experimental units are allocated 
randomly to time points. We have a series of time intervals. We have 2 Groups A and B having 20 data points 
each. 
In this example design, we have observed 40 points with 3-time series. Determining the Power using contrast at 
T5 and comparing it with the Power at T5 estimated using the linear model will be the focus of our analysis. 


Table 2: T5: Gradually increasing difference between groups 


1. Group Tl T2 T3 T4 T5 
A (n= 20) N(u=100, o=5) N(100, 5) N(100, 5) N(100, 5) N(100, 5) 
B (n= 20) N(100, 5) N(102, 5) N(104,5) N(106, 5) N(108, 5) 


We have assessed the time rationale for the inclusion of the repeated measures modelling design element by 
determining the Power of the simulation provided with the Gradually Increasing difference between the groups 
with the Power calculated using T5 only analysis. The repeated measures statistical analysis includes the” 
repeated measures ANOVA”,”Repeated measures analysis of variance” and “the mixed model with the subject 
having the random effect”.[4] 


Simulation is used for estimating power or sample size requirements when the study is complex. The Sample 
Size and the Power Estimations depends on the variance estimates of the fixed effects on the mixed model. 


Ny 1 
VAR (j) = (Sxw x) 


, ind 
Where Vi is called the Yi’s unconditional variance. The probability of the marginal average and conditional 
mean is not equal. While the estimation approaches are different, the results are the same as with the ANOVA. 
However, now we are using a tool that can handle additional time points, continuous covariates with possibly 
nonlinear relationships, different types of outcome variables, other correlational structures among the 
observations, etc.[11], if one looks at the help file for gls, one will note that it suggests using Imer for 
unbalanced designs and other situations (see the appendix for code).[5] 


Table 3: T8: Gradually increasing difference that levels out 


Group Tl 12 T3 T4 T5 T6 7 T8 
N(u=100, 
A (n= 20) . N(100,5)| N(100,5) |N(100,5)/N(100, 5)/N(100, 5)|N(100, 5)|N(100, 5) 
o= 
B(n=20)| N(100,5) |N(102,5)] N(104,5) |N(106, 5)/N(108, 5)/N(108, 5)|N(108, 5)|N(108, 5) 
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CONSIDERING LOWER VALUES OF pB (MEAN) 

A slightly more complicate baseline simulation model for the time series data assumes that the variables 
continue to change frequently. Here the Data Consists of five-time series and eight-time series with the power 
analysis produced following T5 and T8. We are calculating with Autoregressive structure at 0.25, 0.50,0.75 with 
5000 simulations. By one measure, the baseline model produces more accurate results with Group A and Group 
B having 20 data points each. 


Table 4: T5: Gradually increasing difference between groups 


Gradual Increasing difference between groups Values of uB 

uB =101 100,102,104,106,108 

uB =102 100,101.75,103.5,105.25,107 
uB =103 100,101.5,103,104.5,106 
uB =104 100,101.25,102.5,103.75,105 
wB =105 100,101,102,103,104 

uB =106 100,100.75,101.5,102.25,103 
uB =107 100,100.5,101,101.5,102 
uB =108 100,100.25,100.5,100.75,101 


Considering lower values of uB at T5 (107, 106, 105, 104, 103, 102, 101) and Using the same relative gradual 
increases from T1 to T5 (e.g. uB at T5 = 104. Then T2 = 101, T3 = 102, T4 = 103, T5 = 104). Group A fixed, as 
above. These changes are only applied to Group B. In general, there are no basic changes in the group A of the 
simulation model where we have evaluated the inertial tendencies associated with the dependent variable, and 
we can focus on the investigation of the proposed effects on the time-to-time changes which are either 
percentage changes or absolute changes.[6] 


Table 5: T8: Gradually increasing difference that levels out 


Gradual a eral groups Valuesiof ub 

uB =101 100,102,104,106,108,108,108,108 

uB =102 100,101.75,103.5,105.25,107,107,107,107 
uB =103 100,101.5,103,104.5,106,106,106,106 
uB =104 100,101.25,102.5,103.75,105,105,105,105 
uB =105 100,101,102,103,104,104,104,104 

uB =106 100,100.75,101.5,102.25,103,103,103,103 
uB =107 100,100.5,101,101.5,102,102,102,102 
uB =108 100,100.25,100.5,100.75,101,101,101,101 


The explicit estimation uses the baseline models, which provides the potentially valuable information which is 
used for the interpretation of the effects which are observed. This is one of the reasons why we have tended to 
implement with the other dependent variables which are used for getting the power analysis from changes 
which occur period to period. The next section focuses on the distribution of the repeated measure using the 
higher values of o(Standard Deviation). When we use the baseline simulation models to capture the trends in 
the simulation study of T5 and T8, we can identify and evaluate accelerations or decelerations in the power 
analysis according to changes which were made. 

CONSIDER HIGHER VALUES OF oA = oB 


We propose a simulation-based methodology for estimating the power values if the baseline simulation study is 
tweaked with higher amounts of oA and oB. We will also be using the same baseline simulation study to 
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compare the power values with higher values of oA and having a constant oB. We surveyed for the T5 and T8 
analysis, and we will be looking at why we did this study in this section. 


The Datasets in the simulation study has the likelihood function which is constructed using the Autoregressive 
process having values 0.25,0.50, 0.75. The simulation approach consists of the mean and the standard deviation 
from the samples, which uses a set of accepted parameter values. We repeat the procedure for all the stages of 
parameter values which are accepted.[7] 

To reject the null hypothesis and have the power simulation, we look for a big enough difference between the 
subjects and the groups. We assessed the results with 40 studies which used Repeated measures analysis and 
the time point 5, which doesn't use RMA, and we can investigate the reports evaluate the normality of the 
residual errors. Concerning reporting, we have determined the group's data with the homogeneity of the 
variance for at least one of the outcomes which was designated with the standard deviation values.[8] 

Many studies have made a series of increasingly complex calculations which applies the baseline simulation 
model methodology in a sequence. We can also evaluate the same investigation with the observed data, which 
has eight-time points where each group have 20 data points and creating a 5000 simulation. As many of the 
power analysis has multiple outcomes, we can use the power generated from RMA Analysis compared with the 
non-RMA Analysis. 

DIFFERING INCORRECTLY SPECIFIED CORRELATION STRUCTURES 


In this section, we will be looking at the criteria of the variable selection in the function modelling of the mean 
and the correlation structure selection of the potential candidates in the variance-covariance modelling. The 
baseline study is carried out with AR(1) correlation structures; Other correlation structures are mentioned 
below[9]. 

The Generalized estimation of the correlation structures is used for this kind of baseline simulation we use the 
longitudinal data analysis where it can be accounted for the cluster within the correlations without working 
with the variance-covariance structure. The estimation of the variance is consistent for the right variance 
matrix[13] which has the estimated parameters even when the assumed variance correlation structure is 
specified incorrectly. 

First Order Autoregressive Structure 


This is the simples autoregressive model which uses the most recent outcome of the baseline models time, and 
that is in turn used to predict the future values. For a time series Yt such a model is called a first-order 
autoregressive model, often abbreviated AR(1), where the one indicates that the order of autoregression is 
one[10] 

Yt=B0+B1Yt-1+ut 
is the AR(1) population model of a time series Yt 


P 1 Pp 
Pp p 1 p 
OR p Ai 


Other Covariance Structures that might be appropriate A covariance matrix with a diagonal structure. The 
diagonal covariance matrix is known as the variance components model. The covariance matrix that contains 
specified variances along the diagonal. [10] 
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A covariance matrix with compound symmetry. The compound symmetry model is the sum of a constant matrix 
and a diagonal matrix. This structure forms a covariance matrix provided that the diagonal elements are large 
relative to the off-diagonal elements. [10] 


A covariance matrix with Toeplitz structure. A Toeplitz matrix has a banded structure. The diagonals that are 
parallel to the main diagonal are constant. If the diagonal elements are large relative to the off-diagonal 
elements, then the Toeplitz matrix is positive definite.[10] 


toep 
4/1/2/3 
1)}4)1/2 
2)1)4/1 
3/2)1/4 


A covariance matrix with a first-order autoregressive (AR1) structure 

A first-order autoregressive (AR(1)) structure is a Toeplitz matrix with additional structure. Whereas ann xn 
Toeplitz matrix has n parameters, an AR(1) system has two parameters. The values along each diagonal are 
related to each other by a multiplicative factor.[10] 


art 
1 0.25 0.0625 0.015625 
0.25 1 O25 0.0625 
0.0625 0.25 1 0.25 
0.015625 0.0625 0.25 1 


This one includes the independence estimating the power analysis using different variance correlation 
structures. These independent estimated of the model vector which should be processed into a single estimate 
of the particular vector.[11] 

DIFFEREING VARIABLITY IN GROUPS 


The standard approach of the models is to estimate the frequency response of the measure. We will now take a 
look at using different. We will be using increasing variability in one group with the variability being varying 
from 5,6,7,8,9. We will be doing this analysis in only one group. This analysis will give us a reasonable definition 
of the modal vector that the coefficient will be able to measure the matrix, which is estimated to detect such 
potential problems in the variability. 


Table 6: T5: Gradually increasing difference between groups having oB increasing 


Group Tt T2 T3 T4 T5 
A (n= 20) N(u=100, o=5) N(100, 5) N(100, 5) N(100, 5) N(100, 5) 
B (n= 20) N(100, 5) N(102, 6) N(104,7) N(106, 8) N(108, 9) 


The Datasets in the simulation study has the likelihood function which is constructed using the Autoregressive 
process having values 0.25,0.50, 0.75. The simulation approach consists of the mean and the standard deviation 
from the samples, which uses a set of accepted parameter values.[From The previous Section] We repeat the 
procedure for all the settings of parameter values which are accepted.[12] 


Table 7: T8: Gradually increasing difference between groups which levels out having oB increasing 

Group T1 T2 T3 T4 TS T6 T7 T8 
A (n= 20) | N(p=100, o=5) |N(100,5)} N(100,5) |N(100, 5)|N(100, 5) | N(100, 5) | N(100, 5) |N(100, 5) 
B (n= 20) N(100, 5) N(102, 6); N(104,7) |N(106, 8) |}N(108, 9) | N(108, 9) | N(108, 9) |N(108, 9) 
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The Power of the test is usually obtained with the associated non-central distribution[12]. Several hypothesis 
tests can be tested, but there are two most common hypothesis tests. The power analysis that one does not 
reject the null hypothesis when it is false. The Power of the test which is calculated as 1-beta and therefore the 
GLS function can be able to compute the Power when the other parameters for the variance are given. 

Repeated measures analysis needs the ANOVA for the cases in the one observation to be directly linked with the 
circumstances in all the other statements which we have the variance-covariance structure. Considering the 
baseline differences that might affect the outcome could be the main parameters[13]. The Repeated measures 
ANOVA design is appropriate for within the subject matters. This can often result in repeated measurements, 
which has actual effects between the subjects ANOVA. 


IV. RESULTS AND DISCUSSION 


This function fits a linear model using generalized least squares. The errors are allowed to be correlated and 
have unequal variances. Gls is a slightly enhanced version of the Pinheiro and Bates gls function in the nlme 
package to make it easy to use with the rms package and to implement cluster bootstrapping (primarily for 
nonparametric estimates of the variance-covariance matrix of the parameter estimates and for nonparametric 
confidence limits of correlation parameters). 


Comparison of CDF's 
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Figure 1: Graph which shows comparison of CDF’s 
Table 8: Results of AIC, BIC and Loglikelihood Values 
AIC BIC LogLik 
SAS 454.6 474.1 424.6 
R 454.6432 494.8752 (-2)(-212.3216) 


RESULTS OF REPEATED MEASURES ANALYSIS USING BASELINE SIMULATION STUDY 


Although the explanatory model has a more complex alternative, using the P values generated from the models, 
we can have an estimated power. The Statistical Power of a simulation study is the conditional probability given 
on a dependent effect size where the hypothesis test will reject the null hypothesis correctly. Using this power 
analysis, we can conclude that the effect is there when there is one. The situation underscores the power 
analysis that a particular study can be replicated precisely over and over again[2]. For example, if the Power of 
the course is 65%, Which means that 65% of the replication of the study will appropriately reject the null 
hypothesis. 
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Table 9: Results of T3, AR(1) having 5000 simulations. 


Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 


T3 Analysis, AR(1), 5000 


Simulations 


P1 


P2 


P1 


P2 


P1 


P3 


86.6667 


98.9667 


78.7667 


98.8 


74.677 


97.8 


As the baseline model offers a single coherent explanation for the number of simulations 5000 simulations 
fitted our needs. The Power was estimated for the overall comparison between Group A and Group B, and the 
Power 1 was calculated from deriving the contrast for only T3 using repeated measure analysis gls function. 
Power 2 was calculated considering only T3 in the model using an appropriate linear model analysis. The 
Regression coefficients are unreliable indicators of the importance of independent variables[3], and the level of 
significance is associated with the estimates of the statistical value. 

RESULTS OF DIFFERENCE IN NUMBERS OF REPEATED MEASUREMENTS ANALYSIS 


During the analysis, the number of simulations was administered as 5000, As there weren’t much of a 
difference in Power between 1000, 2000 and 3000 simulations. Repeated measures correlation is the method 
which is used for determining the association within multiple subjects. Repeated observations were modelled 
using multivariate likelihood functionlél 


Table 10: Results of T5 Analysis, Gradually Increasing difference between groups 


Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 


difference between 


groups P1 P2 P1 P2 P1 P3 


99.88 99.84 99.88 99.8 99.92 99.9 


This kind of process model is called the exponential decay model where the models grow rapidly and then 
levels off to become asymptotic to the upper limit[6]. Many models for the longitudinal theories start by 
capturing the assumptions about the period to period change. They have the inertia about the magnitude and 
the rates of changes. This will allow the baseline models, which assumes that the dependent variable will not 
change to fit the time series data over time very well. 


Table 11: Results of T8 Analysis, Gradually Increasing difference between groups that levels out 


Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 


difference between 
groups that levels out a i Et i Pt Ba 
99.92 99.82 99.90 99.82 99.94 99.88 
RESULTS OF CONSIDERING LOWER VALUES OF pB (MEAN) 
Chart between Power 1 vs Power 2 AR(1)=0.25 
120 
100 
80 
60 
uB =101 uB =102 uB =103 uB =104 uB =105 uB =106 uB =107 uB =108 
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The focus in the next section will be to explicitly simulate random effects and try to interpret the complex 
differences in the trajectories using higher values of standard deviation. We did two kinds of simulation 
wherein one simulation uses oA = oB, and the other simulation has only changes in oA with constant oB. 


Table 12: Results of T5 Analysis, Gradually Increasing difference between groups 


Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between 
groups Pl P2 P1 P2 P1 P3 
uwB =101 9.62 8.34 8.62 9.34 9.4 9.34 
uB =102 23.92 26.78 23.96 21.47 23.92 22.85 
uB =103 45.78 44.68 45.77 44.58 45.8 44.33 
wB =104 70.22 68.14 70.22 66.18 70.28 68.155 
uB =105 87.68 86.58 87.5 86.68 87.7 86.54 
uB =106 96.42 94.86 96.3 95.7 96.75 95.45 
uB =107 99.38 98.1 99.8 99.1 99.74 99.1 
uB =108 99.88 97.84 99.7 99.3 99.65 99.84 
Table 13: Results of T8 Analysis, Gradually Increasing difference between groups and levels out 
Gradual Increasing | Autoregressive 0.25 | Autoregressive 0.5 | Autoregressive 0.75 
difference between 
groups that levels out Pi Fe Et FZ Pa Pa 
uB =101 7.32 9.36 8.2 9.8 7.32 9.36 
uwB =102 11.24 23.46 13.2 21.98 11.24 23.46 
uwB =130 19.28 45.12 19 44.9 19.28 45.12 
uB =104 30.4 69.54 38.2 65.18 30.4 69.54 
uwB =105 44.88 86.34 52.06 89.18 44.88 86.34 
uB =106 58.64 95.94 54.66 94.9 58.64 95.94 
uB =107 71.94 99.12 72.3 21.98 71.94 99.12 
wB =108 81.54 99.82 80.8 99.8 81.54 99.82 


The greater the sample size, the higher than the power. This is the principle which is used in the previous 
graph. We also learned that the preceding section that anything which increases the value of the denominator, 
the power decreases. Anything that decreases the ability, where the denominator of the test statistic, larger the 
sample size, we will have more data to analyze. 
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RESULTS OF CONSIDER HIGHER VALUES OF oA = oB 


CONSIDER HIGHER VALUES OF oA = oB 
100 
90 
80 
70 
60 
50 
40 
30 


20 


0A = 0B=7.5 oA = 0B =10.0 oA = 0B =12.5 0A = oB=15.0 0A = 0B=17.5 0A = oB =20.0 


The means and the standard deviation have been altered in the last section as well as this section where we 
have the standard deviation, which keeps increasing for both groups. In the next study, we will be creating a 
comparison where the standard deviation changes for only one group. 


Table 14: Results of T5 Analysis, Gradually Increasing difference between groups 


Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between 
groups P1 P2 P1 P2 P1 P3 

oA = oB =5.0 99.88 99.84 99.88 99.8 99.92 99.9 
oA = 0B =7.5 91.34 90.2 91.4 90.38 91.58 90.66 
oA = oB =10.0 70.22 68.18 70.88 69.38 70.68 68.88 
oA =o0B=12.5 50.14 48.7 52.06 50.18 52.4 50.16 
oA = oB=15.0 38.44 37.42 38.2 36.56 38.7 37.02 
oA = 0B =17.5 29.22 27.92 29 27.9 29.92 28.46 
oA = oB =20.0 23.92 22.76 23.2 21.98 23.76 22.58 


This directionality refers to the variance change in power analysis—the Non Directional hypothesis where the 
experimental group has higher power than the usual subjects. We can see that as the variability increases the 
power decreases. Our sample data have more significant variability. Random sampling error is more likely to 
produce considerable differences between the experimental groups even when there is no real effect. If the 
sample data in Study A have sufficient variability, a random error might be responsible for the massive 
difference. 


Table 15: Results of T5 Analysis, Gradually Increasing difference between groups 


Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between groups Pl P2 Pl P2 P1 P3 
oB =5.0 99.88 99.84 99.88 99.8 99.92 99.9 
oB =7.5 97.42 96.98 97.5 96.8 97.42 97.12 
oB =10.0 88.72 87.52 88.62 87.16 88.44 87.04 
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oB=12.5 74.58 73.4 74.98 73.22 74.66 73.46 
oB=15.0 61.4 59.42 61.18 59.2 61.72 60.16 
oB =17.5 49.42 47.64 49.28 49.06 50.18 48.54 
oB =20.0 40.36 39.46 40.62 40.04 40.74 39.92 

Table 16: Results of T8 Analysis, Gradually Increasing difference between groups and levels out 
Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between 

groups that P1 P2 P1 P2 P1 P3 
levels out 
oA = 0B =5.0 99.9 99.82 99.86 99.82 99.94 99.88 
oA =o0B=7.5 92.98 91.26 91.96 90.64 92.18 91.16 
oA = oB =10.0 72.14 70.32 71.96 68.78 71.58 69.44 
oA = oB =12.5 52.24 51.32 53.54 50.1 52.84 51.24 
oA = oB=15.0 38.38 38.44 39.2 39.2 39.32 37.96 
oA = oB =17.5 29.44 30.32 29.52 29.36 30.08 29.34 
oA = oB =20.0 23.08 23.92 24.48 22.96 23.88 24.02 


To reject the null hypothesis, we look at the big difference between the groups. We have two kinds of errors 
which is Type I error and Type II error. Experimental studies need to correctly detect when there is a real 
difference in the groups and also when there is no significant difference in the groups, which are the error. 
When the difference between the subjects is not substantial enough to reject the null hypothesis, we either 
retain the null hypothesis correctly, or we have a Type I error.[12] 


Table 17: Results of T8 Analysis, Gradually Increasing difference between groups and levels out 


Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between groups 
that levels out - Pe rt Fe Pa 

oB =5.0 99.68 99.84 99.88 99.8 99.75 99.9 
oB =7.5 97.8 96.98 97.5 96.8 97.9 97.56 
oB =10.0 88.73 87.52 88.2 87.16 88.72 87.2 
oB =12.5 73.24 734 74.9 73.22 74.66 73.75 
oB=15.0 60.9 59.42 61.18 59.28 61.68 60.96 
oB =17.5 48.6 47.64 49.28 49.88 50.26 48.54 
oB =20.0 46.8 39.46 40.62 40.96 40.74 38.6 


RESULTS OF DIFFERING INCORRECTLY SPECIFIED CORRELATION STRUCTURES 

Vector regression has a lot of features with the longitudinal studies where they exhibit the vectors correlation 
and independence across the vectors. The primary interests of the regression problems are to estimate and 
carry the out joint inferences. A vital feature of the longitudinal setting is that the variances and correlations 
need not be right all the time. Data points can be missing, but it doesn’t affect the power significantly as we are 
carrying out simulations of about 5000 simulations. The different kinds of correlation structures used are 
mentioned below. 
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Table 18: Results of T5 Analysis, Gradually Increasing difference between groups 
Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between groups P1 P2 P1 P2 P1 P3 
AR(1) 99.88 99.84 99.88 99.8 99.92 99.9 
cs 99.88 99.84 99.88 99.8 99.92 99.9 
Toep 99.88 99.84 99.88 99.8 99.92 99.9 
Weighted 99.88 99.84 99.88 99.8 99.92 99.9 
VC 99.88 99.84 99.88 99.8 99.92 99.9 
UN 99.88 99.84 99.88 99.8 99.92 99.9 
Table 19: Results of T8 Analysis, Gradually Increasing difference between groups that levels out 
Gradual Increasing Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
difference between groups 
that levels out aa Re = 2 re ES 
AR(1) 99.9 99.82 99.86 99.82 99.94 99.88 
cs 99.9 99.82 99.86 99.82 99.94 99.88 
Toep 99.9 99.82 99.86 99.82 99.94 99.88 
Weighted 99.9 99.82 99.86 99.82 99.94 99.88 
VC 99.9 99.82 99.86 99.82 99.94 99.88 
UN 99.9 99.82 99.86 99.82 99.94 99.88 


The concept of this is to consistently model the vectors which are evaluated using the power analysis of the 
model, and It also helps us in understanding the errors in the model. The next session, we will be looking at the 
different variability of the groups.[11] 

RESULTS OF DIFFEREING VARIABLITY IN GROUPS 


In the repeated measures analysis using the GLS, we can partition the subject variability and the variability in 
the error terms. The Repeated measures ANOVA uses the model, which includes zero or more independent 
variables. In this section, we will be using the dependent sample T-Test because it can also be compared to the 
mean scores. The repeated measures analysis is also called as the analysis of the dependencies[13]. 


Table 20: Results of T5 Analysis, Gradually Increasing difference between groups 


Increasing Variability in Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
both group oA 
P1 P2 P1 P2 P1 P3 
=0B 
5,6,7,8,9 85.74 78.1 85.32 77.96 87.1 77.92 


Table 21: Results of T8 Analysis, Gradually Increasing difference between groups and levels out 


Increasing Variability in| Autoregressive 0.25 Autoregressive 0.5 Autoregressive 0.75 
one group oB Pl P2 P1 P2 Pl P3 
5,6,7,8,9 95.66 92.36 95.64 92.04 95.66 92.7 


V. SUMMARY AND FINAL THOUGHTS 


The First Method of our analysis was to have SAS vs R comparison to replicate the dental data results, and we 
were able to AIC is calculated as -2LL + 2d in SAS with LL being the maximum value of the log-likelihood and d 
the dimension of the model. In the case of local likelihood estimation, d represents the significant number of 
estimated covariance parameters. In this case, that is two parameters as shown in your output.[5] 
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On the other hand, R uses the degrees of freedom as calculated by Pinheiro and Bates. And they have a vastly 
different interpretation of degrees of space in the context of a mixed model as the one used by SAS.[4] AIC and 
BIC are both penalized- likelihood criteria. They are used for choosing the best predictor subsets in regression 
and often used for comparing non-nested models, which standard statistical tests cannot do. 

The AIC or BIC for a model is usually written in the form [-2logL + kp], where L is the likelihood function, p is 
the number of parameters in the model, and k is 2 for AIC and log(n) for BIC.[5] 

AIC is an estimate of a constant plus the relative distance between the unknown proper likelihood function of 
the data and the fitted likelihood function of the model so that a lower AIC means a model is considered to be 
closer to the truth. BIC is an estimate of a process of the posterior probability of a model being real, under a 
particular Bayesian setup, so that a lower BIC means that a model is considered to be more likely to be the 
actual model. Both criteria are based on various assumptions and asymptotic approximations. Each, despite its 
heuristic usefulness, has therefore been criticized as having questionable validity for real-world data. But 
despite various subtle theoretical differences, their only difference in practice is the size of the penalty; BIC 
penalizes model complexity more heavily. The only way they should disagree is when AIC chooses a larger 
model than BIC.[6] 


We created a baseline study to have the data simulated, and we have concluded with 5000 simulations created 
for the autoregressive structure of 0.25,0.50, 0.75. These are the factors which affect the power. We have 
considered differences in characteristics like Variance, Correlation, Standard Deviation, Mean, Number of 
Repeated measure groups, and the different number of time points. This project was used to determine the 
impact on the power of formal comparisons between groups based on the final measurement when 
intermediate measurements are included in the analysis via a linear mixed model. 

This impact was assessed for varying numbers of intermediate measurements. The nature and extent of this 
impact will be evaluated for at least some of the following conditions and possibly others as the research 
develops in the discussion chapter. 

To reject the null hypothesis, we look at the big difference between the groups. We have two kinds of errors 
which is Type I error and Type II error. Experimental studies need to correctly detect when there is a real 
difference in the groups and also when there is no significant difference in the groups, which are the error. 
When the difference between the subjects is not substantial enough to reject the null hypothesis, we either 
retain the null hypothesis correctly, or we have a Type I error.[8] 

We have the simulations increased from 1000 to 5000 to have the comparison more diverse. We also found out 
that we have statistically significant results with 5000 simulations. We used the GLS function to calculate the 
repeated measure analysis and performed contrasts to get the p-value which undergoes the power analysis 
significant tests. 


As the Type 1 error increases the power will also increase. 


We have the directional and the non-directional power hypothesis where if the Type 1 Error increases, the 
direction increases power. 


Nondivectional (hwvo-tailed) 


Directional (one-taded) 
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This directionality refers to the variance change in power analysis—the Non Directional hypothesis where the 
experimental group has higher power than the usual subjects. We can see that as the variability increases the 
power decreases. Our sample data have more significant variability. Random sampling error is more likely to 
produce considerable differences between the experimental groups even when there is no real effect. If the 
sample data in Study A have sufficient variability, a random error might be responsible for the massive 
difference. 


Hy Hay Hg 


oe ae 


As the difference between the mean increases, we have the power which is increasing—having a baseline 
simulation study of T5, which is the gradually increasing and T8 which is growing progressively but levels out. 


No matter how the samples are simulated, the action difference between the effect size increases and the power 
also increases. 


The higher the error of the variance, the less the power and the graph above can be used to under this principle. 
The increase in the variability within the groups will decrease the ability to find the difference that does exist. 
When the data is analyzed within the group, the error variance will mask the effect between the subjects. 
Anything that makes a difference in the variability, which is decreasing within the groups, there will be an 
increase in the power as the variability decreases. 

The greater the sample size, the higher than the power. This is the principle which is used in the previous 
graph. We also learned that the preceding section that anything which increases the value of the denominator, 
the power decreases. Anything that decreases the ability, where the denominator of the test statistic, larger the 
sample size, we will have more data to analyze. 


VI. CONCLUSION 


In our comprehensive analysis, we compared SAS and R in replicating dental data results, focusing on AIC and 
BIC calculations, revealing subtle theoretical differences but a practical divergence only in the size of the 
penalty. Through 5000 simulations, we examined the autoregressive structure and factors such as Variance, 
Correlation, Standard Deviation, Mean, and Number of Repeated measure groups, assessing the impact on the 
power of formal comparisons between groups in a linear mixed model. Our findings highlight that as Type 1 
error and the difference between the mean increase, so does the power, while an increase in variability within 
groups decreases it. The directionality in power analysis was also observed, showing that as variability 
increases, power decreases, particularly in our sample data with significant variability. The study also 
emphasized that the greater the sample size, the higher the power, a principle reflected in our graphs. This 
project has provided valuable insights into the complex interplay of statistical factors in mixed models, 
contributing to a deeper understanding of error types, effect sizes, and the influence of sample size and 
variability on statistical power. 
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